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ABSTRACT 

Forecasts of statistical constraints on model parameters using the Fisher matrix 
abound in many fields of astrophysics. The Fisher matrix formalism involves the 
assumption of Gaussianity in parameter space and hence fails to predict complex 
features of posterior probability distributions. Combining the standard Fisher matrix 
with Box-Cox transformations, we propose a novel method that accurately predicts 
arbitrary posterior shapes. The Box-Cox transformations are applied to parameter 
space to render it approximately multivariate Gaussian, performing the Fisher matrix 
calculation on the transformed parameters. We demonstrate that, after the Box-Cox 
parameters have been determined from an initial likelihood evaluation, the method 
correctly predicts changes in the posterior when varying various parameters of the 
experimental setup and the data analysis, with marginally higher computational cost 
than a standard Fisher matrix calculation. We apply the Box-Cox-Fishcr formalism 
to forecast cosmological parameter constraints by future weak gravitational lensing 
surveys. The characteristic non-linear degeneracy between matter density parameter 
and normalisation of matter density fluctuations is reproduced for several cases, and 
the capabilities of breaking this degeneracy by weak lensing three-point statistics is 
investigated. Possible applications of Box-Cox transformations of posterior distribu- 
tions are discussed, including the prospects for performing statistical data analysis 
steps in the transformed gaussianised parameter space. 

Key words: methods: data analysis - methods: analytical - methods: statistical - 
cosmological parameters - gravitational lensing: weak 



1 INTRODUCTION 

In recent years many fields of astrophysics have seen a tran- 
sition towards increasingly large experiments and surveys. 
The level of complexity and the costs are rising alongside, re- 
quiring careful planning and assessment of the expected per- 
formance of the envisaged project at all stages. In forecasts 
of the statistical constraints on model parameters by future 
exper iments the Fisher matrix l|Fisherlll935l : iTegmark et al 



1997 ) has proven to be indispensable (e.g. lAlbrecht et al 
2009 ) 



Its ubiquity can largely be attributed to the low com- 
putational cost of a Fisher matrix calculation compared to 
a full mock likelihood analysis, in particular if the data 
set to be analysed and the number of parameters to be 
inferred are large. This simplicity comes at the price of a 
twofold assumption of Ga ussianity in the deriv ation of the 
Fisher matrix expressions (|Tegmark et al.|[l997i ). First, the 
data are assumed to be distributed according to a multivari- 
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ate Gaussian. However, this assumption is shared with the 
majority of full likelihood analyses to date although pre- 
cisio n measurements may require more com plicated forms 
(e.g. iBond et all I2OO0I : iHartlap et all 120091 '). Second, since 
the Fisher matrix is defined as the expectation value of the 
Hessian of the log-likelihood in parameter space, it can only 
fully represent a Gaussian posterior whose logarithm has 
constant curvature. 

Therefore the confidence levels on model parameters de- 
rived from a Fisher matrix are inevitably elliptical. They can 
describe the posterior distribution close to the point of max- 
imum likelihood and indicate linear degeneracies among pa- 
rameters via the ellipticity of confidence regions. Fisher ma- 
trix analyses fail to identify the shape of the posterior away 
from its maximum, as well as to detect non-linear depen- 
dencies of parameters. However, non-linear model parame- 
ter degeneracies are common, and the attempt to minimise 
or break them can drive the design of experiments. Hence 
it is desirable to go beyond the assumption of a Gaussian 
posterior in forecasts for the advanced stages of upcoming 
precision measurements. 
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In this work we propose to combine Fisher matrix 
forecasts with Box-Cox transformations of parameter space 
to obtain a c curat e expectations of posterior distributions. 
IBox fc Coxl (|l964l ') introduced a parametrised set of power 
transformations with which a wide range of data can be 
transformed to follow a Gaussian distribution to good ap- 
proximation. We will apply these transformations to model 
parameters, instead of data, in order to modify a given pos- 
terior into a multivariate Gaussian distribution for which 
a Fisher matrix analysis is exact. After an inverse Box- Cox 
transformation the Fisher matrix results will then accurately 
describe the original posterior. To determine the free param- 
eters of the Box-Cox transformation, the original posterior 
needs to be sampled, and hence an initial mock likelihood 
analysis to be run. 

We will demonstrate this method with an example from 
cosmology. Several ambitious surveys are currently planned 
or designee^ that are going to measure the parameters of 
the cosmological standard model, particularly those of dark 
matter and dark energy, with high precision. These exper- 
iments will investigate several cosmological probes of the 
large-scale structure of the Universe, the potentially most 
powerful one being weak gravitational lensing of distant 
galaxies l|Albrecht et all l2006l : [Peacock et al1l2006l ). Weak 
lensing features a characteristic non-linear degeneracy be- 
tween the two best-constrained parameters ilm (mean mat- 
ter density) and erg (normalisation of matter density fluctu- 
ations) as th ey both govern the overall amplitude of the sig - 
nal (see e.g. iHoekstra et~all l2006l : ISchrabback et al.1 120101 ). 
Hence a mock weak lensing survey provides an excellent test 
case, but we emphasise that the method outlined is applica- 
ble to any prediction for model parameter constraints. 

The paper is organised as follows. Section [2] details 
the principles of Box-Cox transformations, our strategies 
to determine optimal Box-Cox parameters, and the com- 
bined Box-Cox-Fisher formalism. In Section|3]we investigate 
the performance of the proposed method for a mock weak 
lensing experiment, comparing different variants in the im- 
plementation and quantifying the universality of the Box- 
Cox-Fisher formalism. We apply this formalism to a test of 
the degeneracy-breaking capabilities of weak lensing higher- 
order statistics in Section |4l before we summarise and con- 
clude on our findings in Section [S] 



2 BOX-COX TRANSFORMATIONS OF 
PARAMETER SPACE 

Power transformations such as the inverse and square-root 
transformation, or logarithmic transformations are popular 
choices to render the distribution of data more Gaussian. 
The Box-Cox transformation unites these cases with a single 
free parameter per dimension and are hence widely used 
in various areas of science. As trophysical application s are 
rare; one example is the work bv lDineen fc Col3 (|2005 l) who 
tested cosmic microwave background data for Gaussianity. 



^ These include e.g. the Large Synoptic Survey Tele- 
scope (http://www.lsst.org), the NASA satellite WFIRST 
(http://wfirst.gsfc.nasa.gov), and the ESA satellite Euclid 
(http: //sci . esa. int/euclld). 



For a Afp-dimensional variable p the Box-C ox transfor- 
matio n in each dimension /i = 1, .. , A'j, reads ||Box fc Coxl 
11964 ) 



Pm(^m."m) 



[(Pm + «, 
ln(p^ + Qfi) 



= 



(1) 



where the normalisation has been chosen such that the 
transformation is continuous in the parameter at = 0. 
We allow for a shift as a second free parameter in each 
dimension. Note that we denote transformed quantities by 
a bar and drop the dependence on the Box-Cox parameters 
(A, a) unless it needs to be made explicit. 

Usually, equation ((T} is applied to the elements of a 
datavector, but we will henceforth understand as the pa- 
rameters of an A'p-dimensional parameter space. Then the 
transform of a given posterior distribution ^(p) is given by 



V{p)=V{p) Jip,p) 
with the Jacobian 



j{p,p) = n 



M=l 



(2) 



(3) 



The second equality follows directly from equation ((T}. The 
first goal is to determine the set of 2Np parameters, (A, a), 
such that the transformed posterior, V(p), is a multivariate 
Gaussian to good approximation. 

2.1 Optimal Box-Cox parameters 

Suppose a random sample p with n elements, i.e. 
{Pn,i, ..,j3p,„} for every = 1, ..,A'p, is drawn from the 
posterior 'P(p), for instance via Monte-Carlo sampling tech- 
niques. If the Box-Cox transformed posterior is indeed Gaus- 
sian, the distribution is given by 



V{p) = P{p)J{p,p) 



(4) 



^(27r)^pdetCov(p) 

X exp|-i (P-Pn,ax)^ Cov"^(p) (p - P^^J^ 

and has only the Box-Cox parameters, and the mean p^^^ 
and covariance 

Cov(p) = {{p - p^^^) {p - p^^^Y) (5) 

of the Gaussian as free parameters. Since V{p) is assumed 
Gaussian, one can employ the standard maximum likeli- 
hood estimators for the covariance and mean. The latter 
simply implies p — Pniaxi ^o that the exponential in Q is 
unity. Consequently one obtains the following concentrated 
log-likelihood for the Box-Co x parameters (for details see 
IBox fc Cox|[l963 : IVelillalll993l ). 



'Cmax(A,a) = IndetCov [p(A, a)]j 



(6) 



+ (A„-l)^ln(j5^,.-ha^) I , 



up to an irrelevant constant. We have added the subscript 
ML to emphasise that the maximum likelihood estimate for 
the covariance based on p is to be used. Maximising this 
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likelihood for a given sample p should then return Box-Cox 
parameters (A, a) that render V{p) as close to Gaussian as 
possible. 

If Np is small or the likelihood evaluation computation- 
ally inexpensive, it may be more convenient and faster to 
obtain the distribution Vip) dir ectly on a grid instea d of 
using a random sample (see also iFrommert et al.ll2O10h . In 
this case the transformed posterior can be computed readily 
via equation Q for any combination of Box-Cox parameters. 
The optimal parameter combination is then found by com- 
paring V{p) to a Gaussian distribution with the same mean 
and covariance, e.g. by minimising the KuUback-Leibler di- 
vergence 



d "pPrcfiPjln- 



r{p) 



(7) 



^P,ef(p,)ln 



-Pip,) 



M=l 



In the second equality we have replaced the integration with 
a sum over all points of the grid on which the distributions 
are evaluated, assuming a spacing of the points by Ap^ in 
dimension fi. However, we will use Dkl to assess the accu- 
racy of the results of our method in Section[3l so that we use 
a different statistic to determine the Box-Cox parameters. 

Two one-dimensional distributions can be compared via 
their quantiles in a QQ-plot. If both distributions are Gaus- 
sian, the quantile pairs lie on a straight and hence Pearson's 
correlation coefficient of the quantiles, 



''QQ 



,(8) 



should attain unity. Here, Q*''''"'^ denotes the quantiles of the 
Box-Cox transformed distribution and Q^^"'''' the quantiles 
of a zero-mean unit-variance Gaussian distribution, the lat- 
ter readily computed from the cumulative distribution func- 
tion. In practice we use 30-quantiles to calculate equation 
((8|. An advantage of rqq over Dkl is that it is independent 
of the mean and variance of the transformed distribution 
which therefore do not have to be re-computed for every 
change in Box-Cox parameters. 

Since rqq can only be applied to one-dimensional dis- 
tributions, we determine the Box-Cox parameters in every 
dimension of parameter space from the marginalised poste- 
rior in that dimension. When following the approach of using 
a random sample p together with equation ^ to optimise 
the Box-Cox parameters, we will compare the performance 
of determining (A, a) from the full A'p-dimensional posterior 
and the Np marginal posteriors. 



2.2 Box-Cox-Fisher formalism 

Once the optimal Box-Cox parameters are found by either of 
the methods described in the foregoing section, one can pro- 
ceed to unite the Box-Cox transformations with the Fisher 
matrix technique. If the same set of experimental parame- 
ters is used for the Box-Cox-Fisher prediction as for the fidu- 
cial mock likelihood analysis that the optimal Box-Cox pa- 
rameters were determined from, one should obtain identical 
results. Changing the experimental setup in the Box-Cox- 



Fisher forecasts should then yield similarly accurate results, 
as long as these parameters do not depart too strongly from 
those of the mock likelihood analysis such that the shape 
of the posterior would be modified significantly. The univer- 
sality with respect to changes in various parameters of the 
exemplary weak lensing survey will be tested in Section [31 

The task is hence to compute the posterior distribu- 
tion, 'P(p), of model parameters p for a given set of Box- 
Cox parameters (A, a) and a standard Fisher matrix F°'^^^, 
computed for at a fiducial point p^^j in parameter space. In 
analogy to equation (Q the posterior is given by 



detF 
(2^ 

Np 



(9) 

where we used the transformed Fisher matrix F as an es- 
timator for the inverse covariance of the Gaussian of the 
Box-Cox transformed posterior. The peak position p^^x of 
this Gaussian and F are the only unknown quantities in 
equation Q that have yet to be determined. 

In the following we will assume, as in the standard 
derivation of the Fisher matrix, that the prior is uniform 
in the region of parameter space where the likelihood devi- 
ates significantly from zero. Thus the log-likelihood is given 
by £ = — InP, and likewise for the transformed posterior. 
Then equation ((Ojl is equivalent to 



£. = C — — ^iJ.) In (Pm + "/J 



(10) 



M=l 



If we designate p^^ as the result of an inverse Box-Cox 
transformation of p^ax ^^^d employ the definition of the 
Fisher matrix, we arrive at the following expression for the 
transformed Fisher matrix, 



(11) 




Here, angular brackets denote expectation values, and 
is the Kronecker symbol. 

At this point we make the simplifying assumption that 
Pmax ~ Pfldi that the Box-Cox transformation maps the 
peak of the original posterior onto the peak of the trans- 
formed posterior. As will be demonstrated below, this ap- 
proximation holds to high accuracy. Alternatively, one could 
instead Taylor-expand the expectation values of the first and 
second derivatives of jC in equation (jlip . but this step would 
necessitate the computation of third-order derivatives of £. 
already at the first order of the expansion. 

Replacing p^^^^ by p^^j in equation the expecta- 
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tion of the first derivative of tlie log-likelihood vanishes be- 
cause it has a maximum at Pfj^j- invoking the definition of 
the standard Fisher matrix for the original distribution, one 
obtains 



2A„ 



(12) 



+ 5f_,^ A,, (A^ — 1) (j3p,fld + flp) 



We pursue two approaches to determine Pj^^x i or equiv- 
alently, p^ax- Requiring that the transformed posterior 
peaks at p^^^ yields the condition 



+ au 



(13) 



+ an 



- . 



which can be numerically solved after Taylor-expanding the 
expectation value around Pa^, 



E 



(14) 



Alternatively, one can determine p^^^ such that the original 
distribution Vip) peaks at p^^, which, using equation pop . 
leads to the condition 



(15) 



Ptid 



X ^ -Fpi. (Pi',fld - Pi.,max) - (Afi - 1) (p^,Hd + OIm) ^ = . 
1^=1 

After inserting the approximation given by equation p2p . 
one obtains an expression that can analytically be solved 
for Puiax- Since both procedures involve approximations, we 
will compare their performance below in Section [3.21 

Gaussian priors can be added to the diagonal of F°l,^^ 
in the same way as for the standard Fisher analysis, but 
if the priors modify the posterior substantially, they also 
have to be included in the mock likelihood analysis used 
to find optimal Box-Cox parameters. Note that, when grid 
or Monte-Carlo sampling this likelihood, one usually defines 
a maximum range in which the model parameters are al- 
lowed to vary. This corresponds to an implicit top-hat prior 
which cannot be represented in the Fisher matrix formalism. 
Hence, one has to make sure that the posterior used to de- 
termine Box-Cox parameters lies well within the parameter 
space considered. 



3 PERFORMANCE 

To assess the performance of Fisher matrix forecasts com- 
bined with Box-Cox transformations, we consider a mock 
weak lensing survey as outlined in the following. While the 
modelling is at a level of realism similar to current predic- 
tions for planned observational projects, we do not attempt 
to mimic any particular survey, but rather choose the survey 
characteristics such that we obtain a posterior distribution 
of cosmological parameters which serves as a particularly 
useful benchmark. 

Hence, our mock survey will produce a pronounced non- 
linear degeneracy between the parameters ilm, the matter 



density, and as, the normalisation of matter density fluctu- 
ations as an ideal test case. Note that actual future weak 
lensing surveys will generate much stronger parameter con- 
straints and a reduced Qm — crs degeneracy, so that the 
Box-Cox-Fisher formalism should perform well in these cases 
once it does so for the scenario studied in this work. 

We will then investigate in detail the implementation 
outlined in Section (2] before answering the question how 
accurate the Box-Cox-Fisher formalism is when varying 
the fiducial cosmology, survey parameters, the weak lens- 
ing statistic entering the likelihood, and the dimension of 
the posterior distribution. To be of practical use, the pro- 
posed method has to capture the change in the posterior 
distribution caused by all these variations. Only then can 
the formalism be employed for efficient forecasting of pa- 
rameter constraints after a single initial full mock likelihood 
analysis needed to determine the Box-Cox parameters. 



3.1 Mock weak lensing survey 

Weak lensing surveys measure the shapes of millions of dis- 
tant galaxy images which undergo tiny modifications when 
the light emitted by these galaxies is gravitationally lensed 
on its way to Earth. Correlating the shapes of pairs of galax- 
ies, one can infer the statistical properties of the matter dis- 
tribution projected along the line of sight, which in turn 
depends on the cosmological model. In addition, the weak 
lensing signal depends on the distances between observer, 
the structures acting as lenses, and the source galaxy, which 
provides information about the expansion history of the Uni- 
verse. For det ails about gravitational lensing t heory we refer 
the reader to iBartelmann fc Schneider! (|200ll); for a recent 
review on weak lensing measurements see e.g. iMunshi et al.l 
(|2008l ). 

While the majority of weak lensing studies use two- 
point correlation functions as the observable (see Section 
I3.4|l . predictions generally rely on Fourier space measures 
due to their direct connection to theory and their simple 
covariance properties. The power spect rum of the dimen- 
sionless projected mass density k reads (iKaiseiiliggj ) 



C4i) = 



4c4 



(16) 



where £ denotes angular frequency, Ho the Hubble constant, 
and a the cosmological scale factor. The integral runs over 
comoving distance x up to the horizon distance Xhor- The 
power spectrum of the three-dimensional matter distribu- 
tion is given by Ps, which depends on wavenumber k = £/x 
and epoch, specified in terms of comoving distance. The ge- 
ometrical contributions to equation (|16|l are collected in the 
lensing efficiency 



gix) = / dx'ng [z{x')] 



1-A 
X' 



(17) 



where ng{z) denotes the normalised redshift distribution of 
galaxies in the survey. 

We use Ck {£) as our weak lensing observable and evalu- 
ate it for 100 angular frequency bins, logarithmically spaced 
between imin ~ 10 and ^max ~ 10''. The fiducial cosmol- 
ogy used in our calculations is set to f2m — 0.25, as ~ 0.9, 
the baryon density Qh = 0.05, the power-law exponent of 
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the initial matter power spectrum generated by inflation 
ris — 1.0, and the Hubble parameter h = 0.7, where Hq = 
/i 100 km/s/Mpc. Moreover the geometry of the Universe is 
assumed flat by de fault. To comput e P^, we employ the 
transfer function by lEisenstein fc Hul ll 19981) and apply the 
corrections due to non-linear evolution bv lPeacock fc Doddsl 
l|l996l ). 

The projected surface mass density is assumed to be 
Gaussian distributed, which im plies that the covariance is 
given by (jjoachimi et ahllioOSl ) 

Gov [C4iy,C4l')] = Su> (aw + 0, (18) 

i.e. different angular frequencies are uncorrelate cfl. Here, A£ 
is the width of the angular frequency bin, and = 100 deg 
the survey size. The random orientations of the intrinsic 
shapes of source galaxies yield a shape noise contribution to 
equation (|18|l . determined by the intrinsic ellipticity disper- 
sion = 0.35 and the total number density of galaxies on 
the sky ng — 20arcmin~^. We have implemented a redshift 
distribution of the form 

ng(2:) oc 2^ exp {-(z/^o)^'^} , (19) 

where the characteristic redshift scale zq is related to the 
median redshift via zq « Zincd/1-4. The survey is assumed 
to have a median redshift z^ncd = 0.9. 

Following widespread practice, we make use of a Gaus- 
sian likelihood for Ck.{£), 

where the power spectra obtained for the fiducial cosmology 
Pgjj serve as our mock datavector. We assume flat priors and 
make sure that the likelihood peaks well inside the region of 
parameter space considered, so that the posterior is readily 
obtained from L by renormalisation in parameter space. 

Again assuming Gaussianity, the corresponding Fisher 
matrix reads 

^oHg ^ g do^ei cov-i [C4iy,cm , (21) 

where both the derivatives and the covariance are evaluated 
at Pjjj. In writing equation (I2ip we have assumed that the 
covariance does not depend on cosmology; for the same rea- 
son we keep the covariance in equation (|20|l fixed at its value 
for the fiducial set of cosmological parameters. 

3.2 Comparison of implementations 

For most of the analysis we will only vary Q^n and as and 
keep all other cosmological parameters at their fiducial val- 
ues. We compute the posterior on a grid in the f2ni — o"8 plane 
according to equation (|20|) and also derive the marginal dis- 
tributions for the two parameters. In Fig.[T] we show con- 
fidence levels and marginal distributions for the likelihood 

^ Note that the assumption of Gaussi anity is simpUstic, in p artic- 
ular for high angular frequencies (e.g. lKiessling et al.||20 1ld). but 
still w idely used for Fisher matrix forecasts (see iKiesshne et al.l 
l2011al thoueh'). 



analysis as well as for the standard Fisher matrix analysis 
using equation (|2ip . While the marginal Fisher matrix er- 
rors on flm and as are still relatively close to the actual re- 
sults, neither the tails in the marginal distributions, nor the 
banana-shaped form of the two-dimensional posterior and 
the extent of the confidence contours along the degeneracy 
can be reproduced by the standard Fisher matrix. 

As a first step in the Box-Gox-Fisher formalism we de- 
termine the Box-Gox parameters from the full likelihood, us- 
ing either the concentrated maximum likelihood from equa- 
tion ((6l or the QQ-plot correlation coefficient from equation 
((si . The latter is restricted to one-dimensional distributions, 
i.e. in this case the marginal distributions of both f2m and 
erg, whereas Lmax is calculated for the individual marginal 
distributions as well as for the two-dimensional posterior. To 
obtain Lmax, a random sample of size 10^ is created from 
the respective distribution. The optimal values for (A, a) for 
which Lmax or rqq attain a maximum are listed in Table [1] 

Working on one- or two-dimensional distributions, with 
Lmax or rqq as statistic, results in largely different optimal 
values for the Box-Gox parameters. To gain further insight, 
we plot both statistics in the plane spanned by A and a for 
the marginal distribution of erg in Fig.[2J left panel. Both 
Lmax or rqq agree well in the region where they maximise. 
For a wide range in (A, a)-space this maximum lies on a 
nearly perfect and almost linear degeneracy line. 

This degeneracy is mirrored in the shape of the Box-Gox 
transformed distribution, as can be seen in the right panel 
of Fig. (2] where we show the skewness and excess kurtosis of 
the transformed distribution. The degeneracy in maximum 
Lmax or rqq is closely matched by the minimum skewness 
with values close to zero. The kurtosis also features this de- 
generacy; however, it does not vanish, but instead obtains 
a shallow minimum at small negative values along the de- 
generacy line. Note that the skewness and kurtosis of the 
original distribution can be read off at A = 1. In this case 
contour lines are horizontal. 

The mean and variance of the transformed distributions 
increase along the degeneracy line for larger values of A and 
a, so that the degeneracy can be broken by fixing either of 
the two lowest-order moments of the transformed distribu- 
tions. However, since mean and variance are uncritical for 
our purposes, we leave them as free parameters and sim- 
ply use the (A, a) combinations on the degeneracy line that 
our codes produce, the exact values hence determined by 
numerical effects and the maximisation algorithm used. See 
e.g. the values for A and a in the third and fourth row of 
Table [T] which lie in the region of maximum Lmax, rqq and 
minimum skewness. In the appendix we provide a toy model 
that illustrates basic properties of Box-Gox transformations 
including the degeneracy between A and a discussed here. 

With optimal values for A and a at hand, we compute 
the transformed Fisher matrix as given in equation (|12p and 
subsequently the transformed posterior according to equa- 
tion ([9]). The resulting confidence contours and marginal 
distributions, with Box-Gox parameters obtained from the 
marginal distributions via rqq (ID) as well as from the full 
posterior via Lmax (2D), are also shown in Fig.[T] Further- 
more we provide a quantitative statement on how accurately 
the Box-Gox transformed posterior matches the actual one 
by calculating the KuUback-Leibler divergence Dkl as given 
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Figure 1. Left panel: la and 2cr confidence levels in the fljn — plane for the full likelihood analysis (blue dotted lines), the Box-Cox 
transformed posterior based on marginal distributions (red solid lines), and the Box-Cox-transformed posterior based on the two- 
dimensional distribution (orange solid lines). For comparison the results for a naive Fisher matrix computation are shown as black lines. 
Right panels: Same as above, but for the marginalised distributions of Qui (bottom) and erg (top). 



Table 1. KuUback-Leibler divergence -Dkl between the posterior obtained from the full likelihood analysis and the posterior from the 
Box-Cox transformed Fisher matrices, using different implementations. Shown is -Dkl for the distribution in the Om — erg plane in the 
second column, as well as for the marginalised distributions of Cm and (Tg in the third and fourth column. In the fifth to eighth column 
the optimum Box-Cox transformation parameters are listed for every implementation. Box-Cox parameters are either determined from 
the marginal distributions (ID) or the two-dimensional likelihood (2D). Results when using the different approaches to determining Pmax 
are also compared. For comparison results are also given for a standard Fisher matrix analysis. 



analysis method 








D'kl(o-s) 






a(r2ni) 


A(-Tg) 


a (eg) 


standard Fisher 






0.143 


0.029 


3.482 










Box-Cox ID; p^^^ 


from eq. 




0.008 


0.008 


0.022 


-0.74 


0.03 


1.54 


0.28 


Box-Cox ID; p^^^ 


from eq. 




0.008 


0.007 


0.043 


-0.74 


0.03 


1.54 


0.28 


Box-Cox ID; A, a via Lmax 




0.010 


0.008 


0.040 


-0.09 


-0.08 


3.73 


4.00 


Box-Cox 2D; p^,. 


from eq. 


113 


0.013 


0.013 


0.017 


-0.03 


-0.03 


0.87 


-0.46 


Box-Cox 2D; p^,. 


from eq. 


m 


0.014 


0.013 


0.017 


-0.03 


-0.03 


0.87 


-0.46 



by equation (O between the two distributions in Table [T] 
again for both the two-dimensional and marginal cases. 

Both visual and quantitative inspection demonstrate 
that the Box-Cox-Fisher formalism excellently reproduces 
the actual posterior, for all variants of the implementation 
considered. Compared to the standard Fisher results, the 
Box-Cox-Fisher formalism improves Dkl by a factor of 2 
to 4 in the case of the marginal distribution of erg and by 
at least an order of magnitude for the marginal distribution 
of fim. The decrease in Dkl can mainly be ascribed to the 
accurate modelling of the non-Gaussian wings of the dis- 
tributions, but partly also to the shift in the maximum of 



the marginal distributions away from the fiducial cosmology 
which the standard Fisher formalism cannot account for. 

As the left-hand panel in Fig. [1] suggests, the most bla- 
tant discrepancy between the standard and Box-Cox-Fisher 
analysis happens in the — erg plane, with about two or- 
ders of magnitude difference in Dkl . The overall form of the 
posterior is represented accurately by the Box-Cox-Fisher 
contours; only the extent of the 2a confidence levels reveals 
small residual deviations. As expected, if the Box-Cox pa- 
rameters are derived from the marginal distributions, Dkl 
for the marginal distributions is smaller than for the 2D ap- 
proach, and vice versa in the case of Dkl in the fim — erg 
plane. 
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Figure 2. Left panel: Concentrated likelihood Lmax, see equation ||6|, and QQ-plot correlation coefficient tqq, see equation l(8]l, for 
the marginalised distribution of eg, as a function of Box-Cox transformation parameters A and a. Red solid lines correspond to tqq 
and indicate a deviation of 10~^ and 10~* from the maximum of 1. The relative deviation of L^ax from its maximum is shown in grey 
shading, varying logarithmically between 0.1 (white) and 10 ~^ (black). Right panel: Skewness and excess kurtosis of the transformed 
distribution as a function of A and a. Levels of constant skewness are shown in red, indicating values of 0.1, 0.01, -0.01, -0.1 from top 
to bottom. Contours for negative values are dotted. The kurtosis is shown in grey shading, varying linearly between 1 (black) and -0.1 
(white). Levels of zero kurtosis are indicated by the black lines. 
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tion has vanishing skewness and low excess kurtosis, and 
hence can be assumed to be close to a multivariate Gaus- 
sian. Consulting equation this transformed distribution 
should additionally be described well by the transformed 
Fisher matrix, see equation (|12|l . with its peak at p^ax- This 
is illustrated in Fig. [31 and indeed the transformed full pos- 
terior is closely matched by the transformed Fisher matrix 
contours. The slightly more extended confidence contours 
for the full likelihood might hint at a mildly platykurtic 
distribution which agrees with the small negative values of 
excess kurtosis along the degeneracy line in Fig. (2] The abil- 
ity of the Box-Cox transformations to change the posterior 
into a multivariate Gaussian opens up a range of potential 
applications, as we will discuss further in Section (5] 



Figure 3. Icr and 2a confidence levels in the plane of the Box-Cox 
transformed parameters. Red contours correspond to the trans- 
formed full likelihood, black contours originate from the trans- 
formed Fisher matrix given by equation l|12| l and centred at Pj^ax ■ 
The posteriors are Gaussian to good approximation and agree 
well. The results shown were obtained for the case which is shown 
in Fig. [T] as orange lines. 



Note that we have also compared the algorithms given 
by equations (|13[) and ((15} to calculate Pj^ax Table[T] Both 
perform equally well, but since equation (|15|l can be solved 
analytically for p^^^, we will employ this version henceforth. 
Moreover we are going to apply the 2D approach, i.e. deter- 
mining the Box-Cox parameters from the full posterior via 
imax, for the remainder of this paper. 

As seen in Fig.[2j right panel, the optimal choice of Box- 
Cox parameters guarantees that the transformed distribu- 



3.3 Varying cosmology and survey parameters 

We expect the Box-Cox-Fisher formalism to be particularly 
useful in an advanced planning stage of an experiment when 
e.g. the capabilities of breaking model parameter degenera- 
cies come into focus. By then the survey parameters and the 
analysis strategies should not change radically anymore, but 
only in relatively small steps and only a few parameters at 
a time. If that holds true, the general form of the posterior 
is only moderately modified under these changes, so that 
one can continue to use the optimal Box-Cox parameters 
determined from the initial full likelihood analysis. 

The Box-Cox-Fisher analysis is repeated for several sur- 
vey configurations that each differ in one or two parameters 
from the fiducial survey by about 10 %. These changes are 
accounted for in the Fisher matrix, but we retain the val- 
ues of the Box-Cox parameters determined for the fiducial 
survey. A full likelihood analysis is computed as well for ev- 
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Table 2. KuUback-Leibler divergence -Dkl between the posterior obtained from the full likelihood analysis and the posterior from the 
Box-Cox transformed Fisher matrices, varying different survey or cosmological parameters as indicated in the first column. Shown is 
Dkl for the distribution in the Qm — o"8 plane in the second column, as well as the marginalised distributions for Qm and erg in the third 
and fourth column. 



parameters changed 








fiducial parameters 


0.013 


0.013 


0.017 


flm ■■ 0.25 0.225 


0.008 


0.008 


0.025 


Zmod ; 0.9 -> 1.0; Jig : 20 - 


37.4arcmin-2 0.005 


0.005 


0.019 


^max : 10000 -> 8700 


0.015 


0.015 


0.019 


As : 100 110deg2 


0.013 


0.012 


0.016 



ery configuration, but solely for the purpose of assessing the 
accuracy of the forecast. 

We modify the fiducial cosmology by lowering flm by 
10%. A slightly deeper survey is analysed, increasing Zmcd 
to 1, which also increases the number density of galaxies 
and consequently re duces the noise cont r ibutio n. Applying 
the scaling found bv lAmara fc Refregiej l|2007l ). the deeper 
survey has fig = 37.4 arcmin"'^. Moreover we consider the 
case of discarding the highest angular frequency bins in the 
analysis, reducing ^max to 8700. Finally, we increase the sur- 
vey size by 10%. 

Analogously to the foregoing section, we employ the 
KuUback-Leibler divergence to compare the Box-Cox-Fisher 
result with the posterior from the full likelihood analysis. As 
is evident from Table (2] Dkl for the marginal distributions 
and the posterior in the fim — crs plane remains constant to 
good approximation in all cases. 

3.4 Varying statistic and posterior dimension 

One of the most likely modifications in mock weak lens- 
ing analyses is a change in the statistic used as the ob- 
servable. We switch to the frequently employed correlation 
function ^+ which is related to the power spectrum via 
(|Schneider et al.ll2002l ) 

/■°° die 

U9) = ^ Mm c^e) , (22) 

where Jo is the Bessel function of the first kind of order 
0. The covariance of the correlation function can directly be 
determined from equation (|18p . as detailed in ljoachimi et af] 
l|2008l ). We intend to roughly use the same angular scales as 
in the power spectrum analysis and thus consider the range 
larcmin < 9 < 5deg, divided into 50 logarithmically spaced 
bins. Note that this range of angular scales does not ensure a 
similar information content because the angular separation 
bins are strongly correlated. 

Moreover we no w use Population Monte-Carlo sampling 
with CosmoPMCfl (|Cappe et all l2008l : IWraith et aP |2009| ) 
to create a random sample of size lO'^ from the full pos- 
terior in order to determine optimal Box-Cox parameters 
via equation ((6)1 . The results are presented in Fig.|4l finding 
again excellent agreement between Box-Cox-Fisher results 
and full posterior. If the optimal Box-Cox parameters that 
were obtained for the power spectrum analysis in Section[3]2] 

^ http : //www2 . iap.fr/users/kilblnge/CosmoPMC/ 



are used instead, one arrives at constraints of similar qual- 
ity. Therefore the Box- Cox-Fisher formalism should also be 
robust with respect to a change in the weak lensing statistic 
employed in the Fisher matrix. 

As a final test for the practical applicability of the novel 
forecasting method, we have to verify that it is accurate for a 
higher-dimensional posterior. Hence we drop the assumption 
of a spatially flat Universe and vary the density parameter 
of dark energy, Ha, as well as Us in addition to Qm and 
(jg. We perform the analysis as in the foregoing case, using 
again CosmoPMC to create about 10^ random samples of 
the four-dimensional posterior to determine in total 8 Box- 
Cox parameters. 

The confldence contours of the marginalised posterior 
distributions for all possible pairs of cosmological param- 
eters are shown in Fig.[Sl The Box-Cox-Fisher formalism 
yields contours that are able to adopt arbitrary shapes 
and represent the four-dimensional posterior accurately, in- 
cluding the non-linear degeneracies in the ilm — erg and 
Qa — ris planes. The only significant discrepancies between 
the Fisher-based contours and the confidence levels derived 
from the Monte-Carlo sample appears in regions where the 
posterior declines slowly, e.g. for large nm or small erg. The 
frayed contour lines indicate that these regions are still 
sparsely sampled by CosmoPMC. This could imply that the 
Monte-Carlo sample is not suited to allow for a determina- 
tion of optimal Box-Cox parameters which lead to an ac- 
curate posterior shape in these regions. Alternatively, the 
Box-Cox-Fisher formalism might well be robust enough to 
produce a precise representation of the posterior also where 
it is shallow, so that the difference in contour lines would 
be caused by the insufficient Monte-Carlo sampling in that 
regime. 

As a further example for the reliability of the Box-Cox- 
Fisher formalism, we initially observed a slight tilt of the 
Box-Cox-Fisher confidence contours against those from the 
Monte-Carlo analysis, particularly in the Ha — ris plane, 
which could be traced back to a small difference in the 
correlation functions computed by CosmoPMC and the au- 
thors' code. The latter was used to produce the fiducial ^+ 
which served as the mock datavector input to CosmoPMC. 
The small discrepancy in ^+ leads to a small shift in the 
maximum likelihood point as determined by CosmoPMC 
away from the fiducial cosmolgy, as well as slightly different 
derivatives of ^+ with respect to cosmological parameters. 
Consequently, the authors' code and CosmoPMC produce 
moderately discrepant Fisher matrices. Using the latter in 
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Figure 4. Left panel: la and 2cr confidence levels in the flm — f^s plane for the full likelihood analysis (blue dotted lines) and the Box-Cox 
transformed posterior (red solid lines), using the shear correlation function as statistic. The optimal Box-Cox parameters obtained 
from the power spectrum analysis also yield good results in this case, as indicated by the black solid lines. Right panels: Same as above, 
but for the marginalised distributions of Qm (bottom) and as (top). 



the Box- Cox-Fisher analysis instead results in the excellent 
agreement shown in Fig.[5] 



4 AN APPLICATION: BREAKING 

DEGENERACIES IN THE Q.m erg PLANE 

The Box-Cox-Fisher formalism is applicable to a wide range 
of problems. For illustrational purposes we provide in the 
following a toy example which is again built around a mock 
weak lensing survey. Future experiments which are currently 
in the planning stages will not be restricted to measuring 
two-point statistics like Ck.{£), but make use of higher-order 
correlations of galaxy shapes. Three-point statistics such as 
the bispectrum Bk{£i, £2, £3) have been demonstrated to po- 
tentially tighten cosmological parameter constraints consid- 
erably, e.g. by bre aking degeneracies in the fim — era plane 
l|Berge et al.ll2010l ). Since these conclusions rely entirely on 
standard Fisher analyses, we set out to investigate whether 
the breaking of Qm — erg degeneracies is affected by the ac- 
tual, non-elliptical shapes of confidence levels for both two- 
and three-point weak lensing statistics. 

We treat Bk{£i, £2, £3) as o ur observable three- point 
statistic and calculate it via (e.g. iTakada fc Jainll2004l ) 



B4£i,£2,£3) = 



dx 



(23) 



X Bs 



XXX' 



where Bs denote s the mat ter bispectrum. We employ per- 
turbation theory (|Frvlll983 ) to compute Bs from the matter 
power spectrum, applying the co rrections due to non-linear 
struct ure evolution given in IScoccimarro fc CouchmanI 
ll200j). We employ th e bispectrum covariance according to 
Ijoachimi et al.l ( |2009l ). using only the lowest-order term that 
is given in terms of power spectra. Noting that the bispec- 
trum is only non-zero if i ts three arguments c an form the 
sides of a triangle (see e.g. Ijoachimi et al.ir2009l for details), 
we assemble the datavector out of all such combinations, 
where £1,^2,^3 can have 20 logarithmically spaced values 
between 10 and 1000. 

Performing a full mock likelihood analysis for the bis- 
pectrum is computationally costly, even if only two cosmo- 
logical parameters are varied. As a bi-product from another 
project, we have bispectrum computations on a 20 x 20 grid 
in the f2m — as plane at our disposal, albeit for a different cos- 
mology than the fiducial survey outlined in Section [3. II The 
grid was created for a fiducial cosmology with deviating pa- 
rameters f2b = 0.045, h — 0.71, and Us = 0.963. Furthermore 
the non-line a r corr ection for the matter power spectrum by 
ISmith et all l|2003l ) was used. The bispectra were obtained 
for a single source redshift, i.e. the redshift distribution in 
equation (|19p is replaced by a Dirac delta-distribution peak- 
ing at Zs = 1. Finally, the angular frequency binning is 
slightly different, with 18 bins between 10 and 1500. 

We make use of the scaling properties of the Box-Cox- 
Fisher formalism and determine optimal Box-Cox param- 
eters for the bispectrum from a mock likelihood analysis 
based on the gridded bispectra. The changes in cosmology 
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Figure 5. Icr and 2cr confidence levels for the full likelihood analysis (blue dotted lines) and the Box-Cox transformed posterior (red 
solid lines) for all two-dimensional marginalised distributions in the four-dimensional parameter space {Hm, erg, fiA, "s}- Note that the 
Fisher matrix as computed by CosmoPMC has been employed in the Box-Cox analysis. 



are not more than 10 %. Besides, they only affect parameters 
that are kept fixed in this analysis, and that weak lensing is 
less sensitive to than ilm and as- The single source redshift, 
Zs, is similar to the median redshift of the fiducial survey, 
so that the lensing efficiency should change only mildly, see 
equation (jlZp . Likewise, the different non- linear corrections 
and angular frequency coverage should not alter the shape 
of the posterior in the fim — crs plane significantly. 

In the power spectrum analysis we adopt the Box-Cox 
parameters determined for the fiducial survey. We also use 
the fiducial survey parameters, except for i'max which is 
reduced to 3000. The low angular frequency cut-offs for 
both two- and three-point statistics are meant to exclude 
the deeply non-linear clustering regime and thus improve 
the simplistic approximations for the covariances, partic- 



ularly for the bispectrum. Additionally, we combine the 
constraints from two- and three-point statistics by simply 
adding Fisher matrices or multiplying posteriors, respec- 
tively, i.e. we assume that power spectra and bispectra are 
uncorrel ated (which, again , is simplistic but common prac- 
tice, e.g. iBerge et al.ll2010l ). 

In Fig.[S] we contrast the parameter constraints in the 
S^m — OS plane from the standard Fisher matrix and the 
Box-Cox-Fisher analysis for two-point statistics, three-point 
statistics, and both data sets combined. The posterior for the 
bispectrum constraints alone also features the characteristic 
J^m — OS degeneracy, albeit with a tilted degeneracy line, a 
property that is also captured by the standard Fi sher matrix 
(see also iTakada fc Jainll2004l : iBerge et al.ll2010l ). Since the 
intersection of the contours is at a sufficiently large angle. 
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Figure 6. Combined power spectrum and bispectrum constraints on fim and erg. Left panel: Icr and 2cr confidence levels for a standard 
Fisher matrix analysis of two-point weak lensing statistics (red lines), three-point statistics (blue lines), as well as two- and three-point 
statistics combined (black lines). Right panel: Same as above, but for constraints resulting from the Box-Cox-Fisher analysis. 



Table 3. Marginalised 2(t constraints on Qm and erg resulting 
from the standard Fisher and Box-Cox-Fisher analyses of two- 
point, three-point, and combined two- and three-point weak lens- 
ing statistics. 
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the joint constraints in the Box-Cox-Fisher case produce 
fairly eUiptical confidence contours which are of similar size 
as those resulting from the standard Fisher matrix. 

The marginalised constraints on Sim and erg presented 
in Table [3] allow for a more quantitative evaluation. Taking 
into account the accurate shape of the posterior generally 
increases the 2a confidence interval substantially. This in- 
crease is stronger the greater the deviation of the posterior 
from a Gaussian shape, see e.g. the increase by 50 % for fim 
in the power spectrum analysis. In the case of the joint two- 
and three-point constraints, the absolute change in errors is 
smaller, but still the 2a confidence interval grows by about 
40% (30%) for as (fim). 



5 CONCLUSIONS 

In this work we introduced a novel method to compute pre- 
cise predictions for statistical constraints on model param- 
eters from future experiments. By combining two generic 
statistical tools - the Fisher matrix and Box-Cox transfor- 



mations, we were able to drop the assumption of Gaussianity 
in parameter space. Applying Box-Cox transformations to 
model parameters, one arrives at approximately multivari- 
ate Gaussian shapes of the posterior. In this transformed 
space the Fisher matrix can be computed without suffering 
from the usual limits of the Gaussian assumption. An in- 
verse Box-Cox transformation of the Fisher matrix results 
then yields realistic posterior distributions in the original 
parameter space. 

We derived the formalism of the combined Fisher and 
Box-Cox analysis and detailed different approaches to deter- 
mining the parameters of the Box-Cox transformation from 
arr inital likelihood analysis. Utilising a mock weak lens- 
ing survey, we verified the accuracy of the Box-Cox-Fisher 
formalism and demonstrated that it robustly accounts for 
changes in various survey parameters and analysis steps. We 
expect the method to be particularly useful in the advanced 
planning stages of upcoming experiments and surveys, e.g. 
to fine-tune the design with repect to the anticipated pa- 
rameter constraints, or to quantify the breaking of model 
parameter degeneracies when combining data sets. 

A practical implementation of the Box-Cox-Fisher for- 
malism can look as follows: 

(i) Obtain information about the full likelihood for a fidu- 
cial experiment, for instance from a gridded likelihood in 
parameter space or via Monte-Carlo sampling. 

(ii) Determine the optimal Box-Cox parameters using the 
statistics Lmax or tqq, see equations (O and 

(iii) Calculate the standard Fisher matrix for the exact 
experimental setup one is interested in. 

(iv) Compute the posterior via equations (|9]), (|12|) . and 

m- 

The last two steps can be repeated as required for arbitrary 
values of experimental parameters, as long as these changes 
do not alter the shape of the posterior too strongly from the 
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initial likelihood analysis. This might for instance happen if 
several experimental parameters are varied substantially at 
the same time. Unfortunately, adding new model parameters 
to the analysis also potentially modifies the posterior signif- 
icantly, depending on the correlation of this new parameter 
with the existing ones. Therefore we consider it unlikely that 
Box-Cox parameters can be determined to sufiicent accuracy 
from low-dimensional sub-spaces of the posterior distribu- 
tion. 

The price to pay for the dramatically more realistic 
posteriors and confidence regions compare to the standard 
Fisher analysis is the need for an initial determination of 
the Box-Cox parameters which requires detailed information 
about the full posterior distribution. For a realistic number 
of model parameters the sampling or gridded evaluation of 
the likelihood is a computationally expensive step. However, 
complex experiments demand in practice hundreds of fore- 
cast calculations, so that switching to a Box-Cox-Fisher pre- 
diction after an initial full mock likelihood analysis is still 
largely beneficial in terms of computational time. Note that 
the extra calculations required for the new method add only 
marginally to the time the corresponding standard Fisher 
matrix computation takes. 

We illustrated a potential application of the Box-Cox- 
Fisher formalism, investigating the effects of precise poste- 
rior modelling on the joint constraints by weak lensing two- 
and three-point statistics in the f2m — o"8 plane. We find 
that while the shapes of confidence contours for the indi- 
vidual constraints from power spectra and bispectra change 
in a pronounced way from the standard Fisher results, the 
joint posterior is compact and close to the Gaussian form 
predicted by the standard Fisher matrix, hence confirming 
in the simple case we considered that three-point statistics 
can indeed break the Sim — o"8 degeneracy to a large ex- 
tent. However, marginal errors on the cosmological param- 
eters increase substantially by up to 50 % when using the 
Box- Cox-Fisher analysis instead of standard Fisher matrix 
forecasts, which certainly needs to be taken into account for 
predictions of precision measurements. 

Generally, the more compact a posterior is, the more 
it looks Gaussian. Consequently, the local representation 
around the maximum provided by the Fisher matrix pro- 
vides a good description of the complete posterior shape in 
that case. It should be noted that, in order to provide a chal- 
lenging benchmark to test our method, and to facilitate the 
covariance calculations, we deliberately designed our exem- 
plary weak lensing survey to yield only weak cosmological 
constraints. Future weak lensing surveys will perform much 
better, thereby rendering the Gaussian approximation in pa- 
rameter space more appropriate for predictions. 

Yet, experiments will always be faced with complex pos- 
terior distributions. For instance upcoming large-area weak 
lensing surveys will be used to test modifications of gravity. 
A popular parametrisation of deviations from General Rela- 
tivity introduces the gravitational slip and a modification of 
Newton's constant, where the two parameters are perfectly 
degenerate a nd non-linearly related for weak lensing data 
alone (see e.g. lDaniel fc Lindeijboiol '). C omparing the Fisher 
matrix forecasts for these para meters in Guzik et al.l (|2010| ) 
with the likelihoo d analyses in iDaniel fc LindeJ ( 20iy) and 
ISong et all (|2010l ). it is evident that the optimisation of the 



survey design for modified gravity measurements will need 
to go beyond the standard Fisher matrix approach. 

For future developments of the Box-Cox-Fisher formal- 
ism it will prove fruitful to continue the data analysis in the 
Box-Cox transformed parameter space, and not transform 
back to the physically motivated model parameters, as done 
in this work. Then one can fully exploit the Gaussian form of 
the posterior and apply the whole arsenal of statistical tools 
that become accurate, or usable in the first place, on Gaus- 
sian distributions. One such application, which will be dealt 
with in a forthcoming publication, is the subsequent decorre- 
lation of the Box-Cox transformed model parameters, which 
may open up the possibility to define statistically indepen- 
dent variables. 

Many steps in statistical data analysis are simplified 
or improve in accuracy when working with Gaussian dis- 
tributions, so that one can potentially benefit from Box- 
Cox traflsforiMtionsJ^ a wide range of problems. For exam- 
ple, iTavlor fc Kitchind l|2010l ) have developed an analytical 
marginalisation technique that works on Gaussian subspaces 
of the posterior. Using Box-Cox transformations, one can 
transform sub-spaces of or the complete posterior to a mul- 
tivariate Gaussian, and moreover assess the non-Gaussianity 
of a given model parameter to verify whether a transforma- 
tion is required. 

Monte-Carlo Markov Chain (MCMC) methods sam- 
ple a posterior considerably more efficiently if the distri- 
bution is compact and does not feature low-probability 
tails along degeneracy directions. As an example consider 
the cos mic microwave b ackgr ound (CMB) likelihood anal- 
ysis by 'Togmark et al.' ( 2004|) who employed the 'na tural' 
parametrisation suggested by iKosowskv et al.l (|2002| ). fol- 
lowed by the diagonalisation of the parameter covariance 
matrix. Analogously, one could use an intial coarse MCMC 
sample to determine Box-Cox transformations that render 
the posterior approximately Gaussian. After an additional 
decorrelation of parameter space the detailed MCMC anal- 
ysis could be run with high efficiency on a set of model pa- 
rameters which are statistically independent and Gaussian 
distributed to good accuracy. This ansatz is applicable to 
any kind of likelihood analysis and does not require the ex- 
istence of a physically motivated set of natural parameters, 
as in the case of the CMB. 

Note furthermore that logarithmic transformations, 
which constitute a special case of Box-Cox transformations, 
of the large-scale matter distribution or the weak lensing 
convergence have recently been shown to enhance the infor- 
mation content of two-point statistics l|Nevrinck et al.|[2009l : 
ISeo et all bOll'') . The potential of the more general Box-Cox 
transformations in this case is currently under investigation. 
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Figure Al. Upper panel: Illustration of Box-Cox transforma- 
tions. Shown are the transformations for A = 2 in black and for 
A = 3 in grey. The set of black lines in the bottom left corner 
indicates the transformation of the data set |l,\/2, Vs} with 
(A = 2; a = 0), the other set the transformation of the same data 
with (A = 3;a = 1.35). Note that in both cases the transformed 
data values are equidistant. Lower panel: Pairs of Box-Cox pa- 
rameters (A, a) for which the transformed data has zero skewness. 
The black lines correspond to the two cases shown in the upper 
panel. Note that the relation between A and a is linear over a 
wide range. 
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APPENDIX A: ILLUSTRATION OF BOX-COX 
TRANSFORMATIONS 

In the following we will provide a toy model that illustrates 
the principle of Box-Cox transformations, reproducing in 
particular the nearly linear degeneracy between the Box- 
Cox parameters (A, a) encountered in Section [3.21 

Suppose one wanted to transform the data set 
|l, \/2, \/3}, using Box-Cox transformations, such that the 
transformed data have vanishing skewness. An obvious 
choice in this case is to simply square the data, i.e. apply a 
Box-Cox transformation with (A = 2;a = 0). As is demon- 
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strated in Fig. lAll this indeed renders the transformed data 
values equidistant and hence unskewed. 

More generally, the requirement of zero skewness im- 
plies in the case of three elements {a;i, X2, x^} in the datavec- 
tor the condition 13 — X2 — X2 — xi, assuming the transfor- 
mation does not change the ordering of the data. As before, 
the bar denotes the transformed data values. Inserting the 
definition of the Box-Cox transformation as given in equa- 
tion lU, one obtains 

(xi + a)^ + {X3 + a)'' ^2 {x2 + a)^ . (Al) 

The solutions of this equation for the data set 
= 1; 12 = \/2; X3 = \/3} are plotted in the lower panel 
of Fig. lAll revealing a linear relation between A and a, ex- 
cept in the regime a ^ 0. 

From this relation one can read off more combinations of 
(A, a) that should fulfil equation (|A1|) . e.g. {X — 3;a = 1.35). 
As is shown in the top panel of the figure, shifting the data to 
larger values by 1.35 and then taking the third power again 
results in an unskewed transformed distribution, albeit with 
a larger mean and variance than for the case (A = 2; a = 0). 

In summary, this toy model reproduces the findings 
from Fig.[2l Optimally removing the skewness of a data set 
(a task which also seems to govern the Gaussianisation of 
posteriors considered in this work) determines the Box-Cox 
parameters up to a perfect degeneracy which is very close 
to linear over a wide range of the (A, a) plane. Along the 
degeneracy line the mean and variance of the transformed 
distribution vary. Interestingly, even in the simple situation 
considered in this appendix, we cannot derive the linear re- 
lation between A and a analytically. 



